Goto

Collaborating Authors

 Modesto




OptiSeq: Optimizing Example Ordering for In-Context Learning

Bhope, Rahul Atul, Venkateswaran, Praveen, Jayaram, K. R., Isahagian, Vatche, Muthusamy, Vinod, Venkatasubramanian, Nalini

arXiv.org Artificial Intelligence

A common approach to selecting examples at The use of in-context learning (ICL) with large inference-time is to generate embeddings of candidate language models (LLMs) has become a popular examples using a model like Sentence-BERT approach to achieve impressive performance in (Reimers, 2019) and retrieve the top-k most similar many NLP tasks (Raffel et al., 2020; Radford et al., examples for a given test instance, ranking them 2019). In ICL, models are prompted during inference based on distance or similarity. However, there is with task-specific examples that help condition a distinction between ranking examples (determining the generated output. Unlike fine-tuning, it how relevant they are to our test case) does not require updates to the model parameters, and ordering them (deciding how to arrange which offers many benefits with ever-increasing them in the prompt).


CMAViT: Integrating Climate, Managment, and Remote Sensing Data for Crop Yield Estimation with Multimodel Vision Transformers

Kamangir, Hamid, Sams, Brent. S., Dokoozlian, Nick, Sanchez, Luis, Earles, J. Mason.

arXiv.org Artificial Intelligence

Crop yield prediction is essential for agricultural planning but remains challenging due to the complex interactions between weather, climate, and management practices. To address these challenges, we introduce a deep learning-based multi-model called Climate-Management Aware Vision Transformer (CMAViT), designed for pixel-level vineyard yield predictions. CMAViT integrates both spatial and temporal data by leveraging remote sensing imagery and short-term meteorological data, capturing the effects of growing season variations. Additionally, it incorporates management practices, which are represented in text form, using a cross-attention encoder to model their interaction with time-series data. This innovative multi-modal transformer tested on a large dataset from 2016-2019 covering 2,200 hectares and eight grape cultivars including more than 5 million vines, outperforms traditional models like UNet-ConvLSTM, excelling in spatial variability capture and yield prediction, particularly for extreme values in vineyards. CMAViT achieved an R2 of 0.84 and a MAPE of 8.22% on an unseen test dataset. Masking specific modalities lowered performance: excluding management practices, climate data, and both reduced R2 to 0.73, 0.70, and 0.72, respectively, and raised MAPE to 11.92%, 12.66%, and 12.39%, highlighting each modality's importance for accurate yield prediction. Code is available at https://github.com/plant-ai-biophysics-lab/CMAViT.


Calibrate to Discriminate: Improve In-Context Learning with Label-Free Comparative Inference

Cheng, Wei, Wang, Tianlu, Ji, Yanmin, Yang, Fan, Tan, Keren, Zheng, Yiyu

arXiv.org Artificial Intelligence

While in-context learning with large language models (LLMs) has shown impressive performance, we have discovered a unique miscalibration behavior where both correct and incorrect predictions are assigned the same level of confidence. We refer to this phenomenon as indiscriminate miscalibration. We found that traditional calibration metrics, such as Expected Calibrated Errors (ECEs), are unable to capture this behavior effectively. To address this issue, we propose new metrics to measure the severity of indiscriminate miscalibration. Additionally, we develop a novel in-context comparative inference method to alleviate miscalibrations and improve classification performance. Through extensive experiments on five datasets, we demonstrate that our proposed method can achieve more accurate and calibrated predictions compared to regular zero-shot and few-shot prompting.


Decomposing Label Space, Format and Discrimination: Rethinking How LLMs Respond and Solve Tasks via In-Context Learning

Long, Quanyu, Wu, Yin, Wang, Wenya, Pan, Sinno Jialin

arXiv.org Artificial Intelligence

In-context Learning (ICL) has emerged as a powerful capability alongside the development of scaled-up large language models (LLMs). By instructing LLMs using few-shot demonstrative examples, ICL enables them to perform a wide range of tasks without updating millions of parameters. However, the precise contributions of demonstrations towards improving end-task performance have not been thoroughly investigated in recent analytical studies. In this paper, we empirically decompose the overall performance of ICL into three dimensions, label space, format, and discrimination, and we evaluate four general-purpose LLMs across a diverse range of tasks. Counter-intuitively, we find that the demonstrations have a marginal impact on provoking discriminative knowledge of language models. However, ICL exhibits significant efficacy in regulating the label space and format which helps LLMs to respond in desired label words. We then demonstrate this ability functions similar to detailed instructions for LLMs to follow. We additionally provide an in-depth analysis of the mechanism of retrieval helping with ICL and find that retrieving the most semantically similar examples notably boosts model's discriminative capability.


Beyond Confidence: Reliable Models Should Also Consider Atypicality

Yuksekgonul, Mert, Zhang, Linjun, Zou, James, Guestrin, Carlos

arXiv.org Artificial Intelligence

While most machine learning models can provide confidence in their predictions, confidence is insufficient to understand a prediction's reliability. For instance, the model may have a low confidence prediction if the input is not well-represented in the training dataset or if the input is inherently ambiguous. In this work, we investigate the relationship between how atypical (rare) a sample or a class is and the reliability of a model's predictions. We first demonstrate that atypicality is strongly related to miscalibration and accuracy. In particular, we empirically show that predictions for atypical inputs or atypical classes are more overconfident and have lower accuracy. Using these insights, we show incorporating atypicality improves uncertainty quantification and model performance for discriminative neural networks and large language models. In a case study, we show that using atypicality improves the performance of a skin lesion classifier across different skin tone groups without having access to the group attributes. Overall, we propose that models should use not only confidence but also atypicality to improve uncertainty quantification and performance. Our results demonstrate that simple post-hoc atypicality estimators can provide significant value.


A Real-World WebAgent with Planning, Long Context Understanding, and Program Synthesis

Gur, Izzeddin, Furuta, Hiroki, Huang, Austin, Safdari, Mustafa, Matsuo, Yutaka, Eck, Douglas, Faust, Aleksandra

arXiv.org Artificial Intelligence

Pre-trained large language models (LLMs) have recently achieved better generalization and sample efficiency in autonomous web automation. However, the performance on real-world websites has still suffered from (1) open domainness, (2) limited context length, and (3) lack of inductive bias on HTML. We introduce WebAgent, an LLM-driven agent that learns from self-experience to complete tasks on real websites following natural language instructions. WebAgent plans ahead by decomposing instructions into canonical sub-instructions, summarizes long HTML documents into task-relevant snippets, and acts on websites via Python programs generated from those. We design WebAgent with Flan-U-PaLM, for grounded code generation, and HTML-T5, new pre-trained LLMs for long HTML documents using local and global attention mechanisms and a mixture of long-span denoising objectives, for planning and summarization. We empirically demonstrate that our modular recipe improves the success on real websites by over 50%, and that HTML-T5 is the best model to solve various HTML understanding tasks; achieving 18.7% higher success rate than the prior method on MiniWoB web automation benchmark, and SoTA performance on Mind2Web, an offline task planning evaluation.


Delivery robot maker Starship Technologies cuts 11% of workforce

#artificialintelligence

Starship Technologies, one of the earlier companies to enter the outdoor robot delivery market, recently laid off 11% of its global workforce. The company, which has engineering headquarters in Estonia and business headquarters in San Francisco, said it has been negatively impacted by the "dramatic downward shifts" in the global economy and investment market. While it's unclear exactly how many employees Starship has, a LinkedIn search finds 622 people list Starship as their current employer. On top of the layoffs, Starship is closing a small number of unnamed service locations in the U.S. and Germany over the next two months. It said all of the changes focus on cost savings and improving profitability.


Charlene Chambliss: From Psychology to Natural Language Processing and Applied Research

#artificialintelligence

Interest in data science has been exponentially increasing over the past decade, and more and more people are working towards making a career switch into the field. In 2020, articles and YouTube videos about transitioning into a career in data science abound. Yet, for a lot of people, many key questions about this switch still remain: How do you break into data science from a social science background? And what are some of the most important skills in fields like psychology that can be applied to data science? Charlene Chambliss has an inspiring and non-traditional career path.